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Abstract 
Psychosocial functioning plays a key role in students’ wellbeing and performance inside and 
outside of school. As such, techniques designed to measure and improve psychosocial 
functioning factor prominently in school-based service delivery and research. Given that the 
different contexts (e.g., school, home, community) in which students exist vary in the degree to 
which they influence psychosocial functioning, educators and researchers often rely on multiple 
informants to characterize intervention targets, monitor intervention progress, and inform the 
selection of evidence-based services. These informants include teachers, students, and parents. 
Across research teams, domains, and measurement methodologies, researchers commonly 
observe discrepancies among informants’ reports. We review theory and research—occurring 
largely outside of school-based service delivery and research—that demonstrates how patterns of 
informant discrepancies represent meaningful differences that can inform our understanding of 
psychosocial functioning. In turn, we advance a research agenda to improve use and 


interpretation of informant discrepancies in school-based services and research. 
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You are a school psychologist who is conducting a comprehensive evaluation for a 
student suspected of experiencing educational impairment due to an emotional disturbance. If 
found eligible, information gathered in the evaluation will be integral for developing educational 
programming as part of an individual education plan (IEP) for the student. Within the evaluation, 
you gather information via behavior rating scales administered to multiple adult authority figures 
in the student’s life (one parent and their math and history teachers). You observe that ratings 
between the teachers and the parent do not agree with one another. In this scenario, are the 
teachers’ reports “right” and the parent’s report “wrong”? Alternatively, is the parent’s report 
“right” and the teachers’ reports “wrong”? Is the student eligible for special education services if 
there is disagreement in observations of the student’s behavior between key authority figures in 
the student’s life? If found eligible, what should the behavioral goals and programming entail 
based on this discrepant information? The truth is: most school psychologists and educators do 
not have the data available to facilitate answering these questions. 

The above case illustration involves use of ratings from multiple informants to 
characterize a student’s psychosocial functioning to guide educational programming. This is a 
common assessment strategy in school-based services, especially for students with disabilities. 
When using this strategy, discrepancies commonly arise among informants’ ratings. Yet, our 
current technologies for assessing students’ psychosocial functioning do not allow us to 
definitively interpret the meaning (or lack thereof) of these informant discrepancies. 
Conventional wisdom suggests that informant discrepancies represent measurement error or rater 
biases, rendering one or more of the informants’ ratings invalid (De Los Reyes et al., 2015). In 
fact, common approaches for dealing with these discrepancies include analytic techniques that: 


(a) focus on shared variance in latent space (i.e., emphasis on the degree to which ratings 
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between informants overlap), (b) aggregate data from multiple informants into a single measured 
score, or (c) avoid multivariate data entirely in favor of a single “primary measure” completed by 
one, “optimal” informant (see De Los Reyes, Kundey, & Wang, 2011). 

The uncertainty about how to interpret informant discrepancies represents a key barrier to 
educational diagnoses and identifying more precise and effective evidence-based programming 
tailored to student needs and the primary contexts in which they exist. Yet, what if these 
discrepancies ceased serving as barriers to interpreting assessment outcomes, and started serving 
as interpretive tools? In this paper, we discuss the impact of informant discrepancies on school- 
based services and research, and new areas of research and theory on how to improve their 
interpretability. We then advance a four-phase research agenda for studying how informant 
discrepancies might inform characterizing students’ psychosocial functioning and facilitate 
identification of more precise and effective evidence-based services. 

Conceptual Foundations of Assessments of Psychosocial Functioning 

Educational assessments often consist of an array of modalities (e.g., surveys, interviews, 
performance-based tasks) that tap into myriad content domains, such as intellectual functioning, 
impulsivity, reading level, and academic achievement (Merrell, 2008). Among these content 
domains, psychosocial functioning comprises a set of important determinants of outcomes across 
a range of disciplines (American Psychiatric Association, 2013; National Institute of Mental 
Health [NIMH], 2015). The World Health Organization (2018) defines psychosocial health as “a 
state of complete physical, mental, and social well-being, and not merely the absence of disease 
and infirmity.” In our paper, we conceptualize psychosocial functioning as comprising domains 
that connote adaptive functioning (i.e., promotive or protective factors that enhance positive 


outcomes) as well as domains that connote challenges to adaptive functioning (i.e., problem or 
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risk factors that minimize positive outcomes). Along these lines, we review research on 
informant discrepancies in assessments across multiple domains of psychosocial functioning. 
Psychosocial functioning plays a key role in students’ wellbeing and performance inside 
and outside of school (e.g., Reynolds, Livingston, & Wilson, 2006). The different contexts in 
which students exist (e.g., school and home) also serve to influence and modify psychosocial 
functioning. As such, a key component of evidence-based assessments in school-based services 
and research involves reliably and validly assessing domains of psychosocial functioning (Cook, 
Volpe, & Delport, 2014). The ways in which students function psychosocially are not 
constrained by the start and stop of their school day. Social contexts inside and outside of the 
school system influence students’ psychosocial functioning (e.g., school, home, peer 
interactions; Cicchetti, 1984; Luthar et al., 2000). Further, any one student may display behaviors 
indicative of psychosocial functioning in some contexts, such as the classroom or peer 
interactions, to a greater degree than other contexts, such as the home or community (Dirks, De 
Los Reyes, Briggs-Gowan, Cella, & Wakschlag, 2012). In fact, contextual variations in displays 
of psychosocial functioning occur within a variety of domains relevant to academic performance, 
such as attention and hyperactivity, conduct problems, social anxiety, and social competence 
(e.g., De Los Reyes, Henry, Tolan, & Wakschlag, 2009; Deros et al., 2018; Dirks et al., 2012). 
For example, social contexts may differ in the degree to which they elicit displays of 
oppositional behaviors (e.g., noncompliance, defiance) due to differences across contexts in 
social structure, instructional demands, and emotional support (Dretzke et al., 2009). Thus, 
identifying the contexts in which children would benefit from improvements in their 
psychosocial functioning may facilitate more precise intervention programming and boost 


programming efficacy (e.g., NIMH, 2015). 
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If a student varies in levels of psychosocial functioning across social contexts, it logically 
follows that no one person in that student’s life has the capacity to observe and accurately rate 
their functioning across all contexts. In fact, collecting reports from multiple informants about 
students’ psychosocial functioning is a common practice that happens in nearly every public 
school system (Reynolds et al., 2006; Salvia, Ysseldyke, & Bolt, 2012). In school-based 
services and research, data from multiple informants’ reports inform several areas of 
decision-making regarding educational programming, including special education eligibility, 
goals included in students’ IEPs, selection and delivery of supports, monitoring of student 
response to intervention, and placement decisions (Merrell, 2008). Indeed, federal law 
requires “Protection in Evaluation” procedures and defines a comprehensive evaluation as 
one that includes multiple methods derived from multiple informants in order to identify a 
student as eligible for special education services (Individuals with Disabilities Education 
Act, 2004). By gathering reports from different informants who vary in the specific contexts in 
which they observe and interact with students, school professionals may be able to gain a better 
understanding as to how students’ psychosocial functioning varies within and across contexts 
(see also Hunsley & Mash, 2007). In school-based services and research, the multi-informant 
approach typically involves soliciting reports from informants who presumably observe students’ 
behaviors as displayed within the school system (e.g., teachers), outside of the school system 
(e.g., parents), and across systems (e.g., students; see Kraemer et al., 2003). 

To provide data about students, informants may complete standardized instruments that 
ask them to rate how often a behavior has occurred over a specific period. The most common of 
these broadband instruments include behavior checklists such as the Achenbach System of 


Empirically Based Assessments (Achenbach & Rescorla, 2001), Behavioral Assessment System 
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for Children (Reynolds & Kamphaus, 2004), and Social Skills Improvement System-Rating 
Scales (Elliott & Gresham, 2008). Moreover, there are several narrowband instruments that 
educators and researchers regularly administer to gather reports from teachers and parents, 
including the Connors Rating Scales to assess hyperactivity, inattention, and disruptive behavior 
(Connors, 1997), Vineland Adaptive Behavior Scales (VABS) to assess adaptive behaviors 
(Sparrow, Balla, & Cicchetti, 1984), and Childhood Autism Rating Scale (CARS) to assess 
symptoms of autism (Schopler, Reichler, & Renner, 1988). These broadband and narrowband 
instruments are widely used because of their ease of administration and relative low cost 
compared to other more intensive assessment methods. Behavior rating scales, however, are 
indirect assessment methods, in that they do not yield indices of behavior at the time and place of 
their actual occurrence. Indeed, as we describe below, a key element of emerging research and 
theory on multi-informant assessments involves understanding links between informants’ reports 
and behavioral data collected on independent assessments (e.g., naturalistic observations and 
official records; De Los Reyes, Thomas, Goodman, & Kundey, 2013). 

In both service and research settings, collecting multiple informants’ reports generates a 
great deal of information about students’ psychosocial functioning. However, when compared, 
the individual reports from separate informants often yield inconsistent conclusions (De Los 
Reyes, 2011). These informant discrepancies are some of the most robust findings in the social 
sciences (Achenbach, 2006). Over the last 30 years, several large-scale epidemiological studies 
and meta-analytic reviews demonstrate that informant discrepancies occur across development 
(i.e., childhood, adolescence, and adulthood; Achenbach, Krukowski, Dumenci, & Ivanova, 
2005; Achenbach, McConaughy, & Howell, 1987; Duhig, Renk, Epstein, & Phares, 2000), 


cultures (De Los Reyes, Lerner, et al., 2019; Rescorla et al., 2014, 2017), psychosocial domains 
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(e.g., autism, child maltreatment, parenting, suicide, social competence; Gresham et al., 2018; 
Jones et al., 2019; Korelitz & Garber, 2016; Renk & Phares, 2004; Romano, Weegar, 
Babchishin, & Saini, 2018; Stratis & Lecavalier, 2015), and measurement modalities (i.e., 
dimensional and categorical measures; De Los Reyes et al., 2015). In fact, these informant 
discrepancies are quite consistent across samples and studies. For instance, in a meta-analytic 
review of 341 studies on cross-informant correspondence in mental health assessments (De Los 
Reyes et al., 2015), the 95% confidence interval of correspondence levels using the r metric [.22, 
.33] includes magnitudes for which both the upper and lower bounds would have been classified 
by Cohen (1988) as falling in the “moderate” range. To be clear, this is not to say that variability 
in cross-informant correspondence levels does not exist. Indeed, correspondence levels among 
informants who observe behavior in the same context (e.g., pairs of teachers, pairs of parents) 
hovers in the .40s-.60s range (Achenbach et al., 1987; De Los Reyes et al., 2015). Yet, even 
these magnitudes of correspondence do not rise to a level that, in a given sample, any two 
informant’s reports become redundant with one another or yield the same conclusions (see also 
Achenbach, 2006). In sum, scores of research conducted over the last few decades speak to how 
robustly informant discrepancies manifest in assessments of numerous domains of psychosocial 
functioning. 

Given their common occurrence, it should come as no surprise that informant 
discrepancies have historically created uncertainty when interpreting assessment outcomes. For 
instance, these discrepancies often make it difficult to determine whether an individual meets 
diagnostic criteria for externalizing or internalizing disorders. In an outpatient sample of children 
receiving services for externalizing concerns, prevalence estimates of DSM-IV subtypes 


(Primarily Inattentive, Primarily Hyperactive Impulsive, or Combined Type) of Attention 
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Deficit/Hyperactivity Disorder (ADHD) varied widely from zero children in the sample to the 
grand majority of children in the sample, depending on how one combined parent and teacher 
reports (1.e., “and” vs. “or” rule) and the instrument used to gather parent and teacher reports 
(i.e., interview vs. rating scale; see Table 5 of Valo & Tannock, 2010). Informant discrepancies 
also impact judgments about the efficacy of interventions in that intervention effects may range 
from “small” to “large” depending on the informant (for reviews, see De Los Reyes & Kazdin, 
2006, 2009). Moreover, researchers observe informant discrepancies when assessing severity of 
symptoms of students with autism spectrum disorders, leading some to conclude that their 
research “support[s] the role of environmental context in psychiatric symptom expression in 
children affected by autism and suggest[s] that informant discrepancies may provide critical cues 
for these children via specific environmental modifications” (Kanne, Abbacchi, & Constantino, 
2009). Clearly, these discrepancies greatly influence intervention programming and 
interpretation of intervention outcomes and may lead to differential conclusions regarding the 
efficacy of interventions. Importantly, informants within (e.g., across teachers) versus outside 
(e.g., parents) of the school system often vary as to whether they perceive school interventions as 
yielding beneficial effects in children’s behavior, and these informants also comprise key 
stakeholders in the administration and outcomes of school-based interventions. Thus, failing to 
properly attend to informant discrepancies can have profound implications for school-based 
services and research. 

Despite the historic difficulty in understanding and interpreting informant discrepancies, 
their robust presence in school-based services and research makes intuitive sense. As mentioned 
previously, students’ social contexts (e.g., classrooms, home environment) may vary 


considerably in eliciting the behaviors that impact school performance. Given these variations, 
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informants embedded in different social contexts (e.g., parents vs. teachers) should also differ in 
how they observe and thus rate students’ behavior. The intuitive nature of informant 
discrepancies and their ubiquity in school-based services and research makes it even more 
surprising that in practice, we lack consensus guidelines on how to interpret informant 
discrepancies (see also Beidas et al., 2015). The lack of guidelines for this crucial element of 
service provision likely stems from long-held traditions in measurement construction (e.g., 
pursuit for agreement across raters) and interpretation of multivariate data (for a review, see De 
Los Reyes et al., 2015). That being said, an emerging body of research—occurring in disciplines 
largely outside of school-based service delivery and research—reveals new insights into these 
informant discrepancies. Specifically, the degree of discrepancy among informants’ reports about 
behavior yields important information about that very behavior. In fact, informant discrepancies 
may reflect individual differences in displays of behavior based on the context(s) in which they 
occur and the psychosocial phenomena they reflect. Theoretical models exist to explain these 
discrepancies (De Los Reyes, Thomas et al., 2013), and empirical paradigms offer ways to model 
or test these discrepancies (e.g., Lerner, De Los Reyes, Drabick, Gerber, & Gadow, 2017). 
Below we review recent advancements in theory on informant discrepancies, as well as an 
emerging evidence base that supports this theoretical work. We then lay the groundwork for a 
four-phase research agenda for which the central goal is to improve the interpretability of 
informant discrepancies in assessments of psychosocial functioning in school-based services and 
research. 
Recent Theoretical Work on Informant Discrepancies: Operations Triad Model 
Many challenges arising from use of multi-informant assessments stem from the historic 


lack of conceptual models for interpreting informant discrepancies (for a review, see De Los 
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Reyes, Thomas, et al., 2013). Specifically, Converging Operations is the dominant conceptual 
model for interpreting multivariate data (Garner, Hake, & Eriksen, 1956). Converging 
Operations holds that one interprets the veracity of a study’s findings—and by extension, 
multiple data points collected for an individual assessment—based on the extent to which 
findings from multiple methodologically distinct data sources (e.g., different informants’ reports) 
converge and point to the same conclusion. In fact, data triangulation through obtaining reports 
from multiple informants is a key component of “best practices” advocated in school psychology 
assessment textbooks (e.g., Sattler & Hoge, 2006). As a framework, Converging Operations 
drives its users toward removing or otherwise discounting source-specific variance (e.g., 
discounting a parent’s report based on the judgment of the school psychologist; see also De Los 
Reyes et al., 2015). This can be seen in empirical approaches described previously that focus on 
shared variance and eschew, dismiss, or otherwise nullify use of unshared or informant-specific 
variance (De Los Reyes, Kundey, & Wang, 2011). 

Importantly, these approaches informed by Converging Operations lie in stark contrast to 
the very reasons why one collects multi-informant data. Indeed, in school-based services and 
research, one takes a multi-informant approach to assessment based on the ideas that (a) students 
behave differently, depending on the social context or environmental demands; and (b) 
informants often vary in where they observe the students about whom they provide behavioral 
reports (Achenbach et al., 1987). Classrooms, for example, are performance settings in which 
students are expected to arrive on time, be prepared with materials, and engage in academic 
instruction and activities. In contrast, at home students have access to their belongings and often 
spend a great deal of time at rest or engaging in leisure and recreational activities. These 


assessment conditions thus beg the question: Why would one expect multiple informants’ reports 
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of the same student’s behavior to yield the same conclusion? Why would researchers place 
exclusive focus on estimating common variance? If multiple informants’ reports potentially yield 
contextually sensitive data from discrepancies among these reports, then focusing exclusively on 
estimating common variance may result in (a) loss of valuable information about the behaviors 
targeted for assessment, (b) erroneous eligibility decisions, and thus (c) less effective 
programming. 

One organizing framework for using and interpreting multi-informant assessments is the 
Operations Triad Model (De Los Reyes, Thomas, et al., 2013). We graphically depict this 
framework in Figure 1. The Operations Triad Model expands upon the Converging Operations 
concept (Figure 1a) by delineating conditions for identifying patterns of multi-informant 
assessment outcomes that reflect Converging Operations, as well as two alternative concepts. 
First, the Operations Triad Model specifies conditions for Diverging Operations (Figure 1b), a 
set of measurement conditions by which multiple informants’ reports yield discrepant findings, 
and the discrepancies reflect meaningful variation in the behaviors being assessed. An example 
of a Diverging Operations scenario might involve a teacher reporting hyperactivity and 
noncompliance concerns in a student that go uncorroborated by a parent’s report, with the 
discrepancies occurring because the student primarily displays hyperactivity and noncompliance 
when interacting with peers in the context of classroom activities (i.e., few concerns displayed at 
home). Second, the Operations Triad Model specifies conditions for Compensating Operations 
(Figure 1c), a set of measurement conditions by which multiple informants’ reports yield 
discrepant findings, and the discrepancies reflect mundane methodological features of the 
assessment procedures. For instance, Compensating Operations might manifest if a teacher and 


student provide reports using measures of a psychosocial domain like social competence that 


INFORMANT DISCREPANCIES IN SCHOOL-BASED RESEARCH 14 


differ in item content and scaling. Consequently, the two reports may have diverged because 
their measures differed in terms of their psychometric properties. The ability of the Operations 
Triad Model to distinguish informant discrepancies that reflect meaningful versus mundane 
reasons (i.e., Diverging Operations vs. Compensating Operations) is a key element of the 
framework. Indeed, this distinction provides the conceptual foundation to guide direct empirical 
tests of informant discrepancies, as well as tests of the assumptions that often underlie use of 
commonly leveraged multivariate techniques (e.g., structural equation modeling and use of 
composite scores to aggregate data). 

More broadly, embedded in the Operations Triad Model are a set of measurable, testable 
conditions for distinguishing assessment outcomes that reflect Converging Operations, Diverging 
Operations, or Compensating Operations, which we present in Figure 2. Specifically, the 
Operations Triad Model facilitates the process by which one poses a priori hypotheses as to 
whether they expect to observe converging findings or diverging findings among a set of 
multiple informants’ reports (Figure 2a). Empirical questions outlined in the figure can then 
guide tests of these expectations. For instance, these questions can be used to determine if the 
evidence supports the a priori expectation of converging findings (i.e., Converging Operations; 
Figure 2b). Questions can also be posed to determine whether the evidence supports the a priori 
expectation of diverging findings as yielding meaningful information about behavior (i.e., 
Diverging Operations; Figure 2c). If the evidence fails to support either of these hypotheses, one 
can proceed to testing whether the observations are best explained by measurement error (i.e., 
Compensating Operations; Figure 2d). 

In sum, the Operations Triad Model provides researchers with an evidence-based 


approach for using and interpreting multi-informant assessments. Using this framework, 
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researchers can test whether their multi-informant approach yields data that converge on a 
common outcome or diverge toward different outcomes for meaningful or mundane reasons (cf. 
Figures 1b and 1c). Further, the Operations Triad Model has heuristic value. As evidence of this, 
consider the recent development of modified versions of the Operations Triad Model for 
interpreting (a) physiological data in mental health assessments (De Los Reyes & Aldao, 2015) 
and (b) developmental assessments of family functioning (De Los Reyes & Ohannessian, 2016; 
De Los Reyes, Ohannessian, & Racz, 2019). 
Empirical Work Supporting the Operations Triad Model 

An emerging body of empirical work across multiple disciplines supports central tenets 
of the Operations Triad Model. The first tenet is that, if informant discrepancies yield systematic 
information about psychosocial domains, then magnitudes or levels of these discrepancies should 
co-vary with basic characteristics of both the constructs/domains assessed by informants’ reports 
as well as the social contexts (i.e., where informants observe behavior). For example, basic 
psychometric work indicates that when collecting multiple raters’ reports of observed behavior, 
one should observe the greatest inter-rater agreement for behaviors that are relatively easier to 
observe (e.g., aggression vs. low mood; Groth-Marnat & Wright, 2016; Hunsley & Lee, 2010). 
Consistent with this expectation, two large-scale meta-analyses that collectively reviewed 50 
years of research on cross-informant correspondence in reports of youth mental health observed 
greater correspondence levels for reports about externalizing difficulties relative to internalizing 
difficulties (Achenbach et al., 1987; De Los Reyes et al., 2015). Another expectation is that 
informants rating behaviors based on observations from the same context should correspond in 
their reports to a greater degree than informants who base their ratings on observations from 


different contexts (Kraemer et al., 2003). In fact, the two meta-analyses described previously 
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also revealed greater correspondence between reports by informants from the same context (e.g., 
pairs of parents; pairs of teachers) relative to reports by informants from different contexts (e.g., 
mother and teacher; Achenbach et al., 1987; De Los Reyes et al., 2015). 

A second tenet is that, for informant discrepancies to yield meaningful information about 
psychosocial domains, levels of these discrepancies need to vary among informant pairs. That is, 
low correspondence characterizes multiple informants’ reports in general. Nevertheless, in 
addition to pairs who provide very different reports, there should also exist some informant pairs 
who provide quite similar reports. To test these ideas, researchers often leverage person-centered 
models of data analysis (e.g., latent class analysis; McCutcheon, 1987) to examine whether 
samples with multiple informants’ reports contain subgroups of informant pairs who vary in the 
levels of discrepancies between their reports. In fact, among many informant pairs (e.g., parent- 
child, teacher-parent, mother-father) and domains (e.g., internalizing and externalizing 
difficulties, parenting and parental monitoring, family conflict, social competence), prior work 
identifies at least three kinds of reporting patterns: (a) converge on reports of high levels of 
psychosocial functioning; (b) diverge such that one informant reports lower levels of 
psychosocial functioning, relative to the other informant; or (c) converge on reports of low levels 
of psychosocial functioning (e.g., De Los Reyes et al., 2015). For example, in two studies about 
assessments of children’s conduct problems, researchers identified four groups of teacher-parent 
dyads: (a) both teacher and parent agree on low conduct problems, (b) teacher (but not parent) 
reports high conduct problems, (c) parent (but not teacher) reports high conduct problems, and 
(d) both teacher and parent agree on high conduct problems (Fergusson, Boden, & Horwood, 
2009; Sulik et al., 2017). 


A third tenet is that, not only should there exist individual differences in levels of 
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informant discrepancies, but that these individual differences should convey meaningful 
information about the psychosocial domains for which informants provide reports. Recent work 
reveals how informant discrepancies convey these kinds of meaningful information. Specifically, 
informant discrepancies may inform how one characterizes assessed behaviors. That is, beyond 
the information provided by individual informants’ reports of psychosocial functioning, 
understanding patterns of agreement and disagreement across informants’ reports reveal 
important aspects of such functioning that cannot be obtained by simply interpreting reports 
individually or additively. For example, in assessments of disruptive behavior in early childhood, 
when parents endorse disruptive behavior that teachers do not endorse, these discrepancies tend 
to point to children displaying observed disruptive behavior with parents but not non-parental 
adults on independent, contextually sensitive laboratory tasks (De Los Reyes et al., 2009). In this 
study, teacher-endorsed disruptive behavior that went uncorroborated by parent report pointed to 
children displaying observed disruptive behavior with non-parental adults and not parents, and 
children for whom both teacher and parent endorsed disruptive behavior tended to display 
observed disruptive behavior across interactions with parents and non-parental adults. In short, 
this study found that informant discrepancies provided information about contextual variations in 
the assessed behavior. That is, informant discrepancies may not reflect measurement error but 
rather meaningful differences in displays of the child’s behavior across social environments. 
Similar findings manifest between (a) parent and teacher reports in assessments of aggressive 
behavior in relatively older children (Hartley, Zakriski, & Wright, 2011); (b) parent and teacher 
reports of childhood autism spectrum symptoms (Lerner et al., 2017); (c) parent and adolescent 
reports in assessments of adolescent social anxiety (Deros et al., 2018; Glenn et al., 2019); (d) 


parent and youth reports in assessments of youth depression collected in school settings (Makol 
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& Polo, 2018); (e) parent and adolescent reports of adolescent psychosocial functioning in the 
context of screening assessments for pediatric asthma (Al Ghriwati, Winter, Greenlee, & 
Thompson, 2018); (f) parent and adolescent reports of adolescent internalizing concerns 
completed at admission to an inpatient unit (Makol, De Los Reyes, Ostrander, & Reynolds, 
2019); and (g) cross-parental reports in non-clinic, community assessments of adolescent 
psychosocial functioning (De Los Reyes, Alfano, Lau, Augenstein, & Borelli, 2016). 
Research Agenda for Informant Discrepancies in School-Based Services and Research 
The body of research and theory reviewed previously supports the idea that informant 
discrepancies in reports of students’ psychosocial functioning may contain meaningful 
information about this functioning. Yet, a hallmark principle of evidence-based assessment is 
that one should not assume that the psychometric properties of assessments generalize to all 
assessment occasions or service populations (Groth-Marnat & Wright, 2016; Hunsley & Mash, 
2007). In line with this, we propose a four-phase agenda for improving the interpretability of 
informant discrepancies in assessments of students’ psychosocial functioning in school-based 
services and research. This four-phase agenda reflects the translational research spectrum (see 
Fishbein, Ridenour, Stahl, & Sussman, 2016) beginning with more basic science and culminating 
with implementation science to incorporate findings into routine practice. In Figure 3 we 
summarize each phase in the agenda. Below and for each phase in the agenda, we either cite 
prior work that provides examples of studies that meet the aims and scope of research conducted 
in the phase, or explain how one might conduct research in a phase within school-based 
populations. 
Phase 1: Generalizability of the Operations Triad Model to School-Based Assessments 


Phase 1 of the research agenda involves testing whether the Operations Triad Model 
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generalizes to school-based assessments of students’ psychosocial functioning. That is, low-to- 
moderate levels of cross-informant correspondence manifest in school-based assessments of a 
host of psychosocial domains (e.g., Lee, Elliott, & Barbour, 1994). Further, for several measures 
of psychosocial functioning there exists documented evidence of the presence of informant 
discrepancies when these measures are used in school-based assessments. Specifically, Table 1 
includes a list of several of these instruments. To identify these instruments, we reviewed the 341 
articles examined in a recent meta-analysis of child and adolescent mental health (De Los Reyes 
et al., 2015). The measures reported in Table | were those used in studies tested in the meta- 
analysis that (a) collected multi-informant data on the measure, (b) estimated levels of cross- 
informant correspondence in reports on that measure, and (c) did so using a school-based sample. 

However, as mentioned previously research informed by the Operations Triad Model 
finds that underlying low levels of cross-informant correspondence there exist considerable 
between-dyad variations or subgroups of multi-informant reports in terms of the level (e.g., 
convergence vs. divergence) and direction (e.g., teacher > parent and parent > teacher) of 
informant discrepancies (see also De Los Reyes et al., 2019). Similarly, do subgroups or patterns 
of multi-informant reports described previously also manifest when assessing psychosocial 
functioning in school-based settings? For example, in large samples of school-based assessments 
of students’ specific externalizing behaviors (e.g., aggression, noncompliance, hyperactivity), 
can one reliably identify “pockets” of teacher-parent dyads in these samples who converge in 
reports of high levels of externalizing concerns; diverge such that teachers report higher concerns 
relative to parents, and vice versa; or converge in reports of low levels of externalizing concerns? 
If school-based assessments yield between-dyad differences in patterns of multi-informant 


reports, the key question would be: Do these differences meaningfully relate to variations in 
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psychosocial phenomena relevant to school-based services and research? 

Several recent studies cited previously from research groups in clinical psychology, 
developmental psychology, and psychiatry reveal strategies for addressing these questions (Al 
Ghriwati et al., 2018; Lerner et al., 2017; Makol & Polo, 2018). Each of these studies had one 
thing in common: They tested links between patterns of multi-informant reports and independent 
measures of psychosocial phenomena relevant to the instruments on which informants provided 
reports. By “independent measures,” we mean data derived from modalities and/or information 
sources that do not overlap with the informants completing ratings about students’ psychosocial 
functioning. In this way, one can leverage these independent instruments as criterion measures to 
test in relation to patterns of multi-informant reports, without the confound of shared method 
variance between predictor and criterion variables (i.e., criterion contamination; see De Los 
Reyes et al., 2015). For instance, consider a study for which a key aim was to test whether 
discrepancies between parent and teacher survey reports of social competence reflected 
variations in the degree to which students displayed low competence in home and/or school 
contexts. If parents and teachers also completed the criterion measures (e.g., number of friends 
with whom the student interacts), then any links between patterns of multi-informant reports and 
the criterion measures could be parsimoniously explained by the same two informants providing 
data for both the predictor and criterion measures. 

Needless to say, addressing issues of criterion contamination also raises issues of 
feasibility for testing the generalizability of the Operations Triad Model in school-based settings. 
Indeed, previous studies testing elements of the Operations Triad Model often used scores taken 
from performance-based tasks or independent observers’ ratings of behavior on laboratory- 


controlled social interaction tasks (e.g., De Los Reyes et al., 2009; De Los Reyes, Alfano, Lau, 
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Augenstein, & Borelli, 2016; De Los Reyes, Bunnell, & Beidel, 2013; Glenn et al., 2019; Lerner 
et al., 2017). These laboratory tasks might be difficult to implement in school-based settings. 
Importantly, tests of the Operations Triad Model are not limited to use of these laboratory-based 
criterion measures. The key issues regarding research on the Operations Triad Model are two- 
fold. The first is minimizing or avoiding the criterion contamination issues described previously. 
The second is identifying criterion measures that reflect variations in psychosocial phenomena 
relevant to the constructs about which informants provide reports. Regardless of the criterion 
measures one implements, the purpose is to test whether patterns of multi-informant reports 
“match” variations in behavior observed on the criterion measures. 

An example here may be helpful. Consider a study focused on testing the meaning of 
informant discrepancies in school-based assessments of students’ externalizing behaviors. For 
those students for whom parents report externalizing concerns that teachers’ reports do not 
corroborate, are those students also likely to experience associated features of externalizing 
concerns at home (e.g., maladaptive parenting; Hunsley & Lee, 2010)? Further, are those 
students, at the same time, experiencing protective factors at school (e.g., teacher displays proper 
classroom management skills) or the absence of associated features of externalizing concerns at 
school (e.g., rejection by peers during classroom activities or social isolation)? For those students 
for whom teachers report externalizing concerns that parents’ reports do not corroborate, are they 
also likely to experience associated features of externalizing concerns at school to a greater 
degree than at home? Do students for whom both teachers and parents report externalizing 
concerns experience associated features of such concerns across home and school contexts? The 
answers to these questions allow one to discern whether patterns of multi-informant reports of 


students’ externalizing behaviors contain meaningful information. In this way, Phase 1 addresses 
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whether informant discrepancies have the potential to inform school-based assessments of 
students’ psychosocial functioning for the purposes of eligibility determination and the selection 
and delivery of evidence-based programming. 

Phase 2: Strategies for Optimizing Multi-Informant Assessments to Yield Interpretable 
Data about Informant Discrepancies in School-Based Services and Research 

In many respects, Phase | of this proposed research agenda focuses on the basic science 
of informant discrepancies in school-based settings. That is, the work reviewed previously, 
conducted largely outside school-based service and research settings, reveals both the presence 
of informant discrepancies and the prospect that they yield key indices reflecting psychosocial 
functioning. However, all of this work involved leveraging instruments that were not designed to 
sensitively assess informant discrepancies. In fact, researchers developed current multi-informant 
instruments as one would from a Converging Operations perspective: A parallel format with the 
same items, responses options, and item content largely held constant across informants. 

In prior work, use of parallel formats across informants’ reports greatly facilitated study 
of informant discrepancies. Indeed, parallel formats allow researchers to parsimoniously rule out 
changes in instrumentation as an explanation for discrepancies observed between informants’ 
reports (De Los Reyes, Thomas, et al., 2013). However, the foundation for use of parallel 
construction of multi-informant measures largely rests on the idea that the constructs about 
which informants provide reports manifest in much the same way across informants—and by 
extension—across contexts; tests of this foundational assumption are often referred to as tests of 
measurement invariance (Dirks et al., 2014; Olino, Finsaas, Dougherty, & Klein, 2018; Russell, 
Graham, Neill, & Weems, 2016). Yet, decades of research on informant discrepancies prove 


otherwise: Several meta-analyses published over the last 30 years reveal that informants often 
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rate behavior in fundamentally different ways across social contexts (e.g., Achenbach et al., 
1987; De Los Reyes et al., 2015; Duhig et al., 2000). Further, exclusively taking a parallel format 
approach to multi-informant assessment translates into developing measures designed to 
minimize informant discrepancies, by deleting items upon which informants systematically differ 
in item responses. It is no wonder why anyone using these measures would often find themselves 
confronted with perhaps a misguided task: Determining which informant is providing the “valid 
report” and which ones are providing “biased or skewed reports” (see also De Los Reyes, 
Youngstrom, et al., 2011). 

The findings of prior work, as robust as they are, may be grossly underestimating the 
impact of informant discrepancies on services and research geared toward understanding and 
intervening to improve psychosocial functioning. Thus, for informant discrepancies research 
conducted at the sample or group level to translate to routine assessments conducted in school- 
based settings, we require a second phase of work (1.e., Phase 2) to focus on measurement 
development and/or refinement of existing measures of students’ psychosocial functioning. 

In Phase 2 of our agenda, we call for a significant paradigm shift in how we 
conceptualize and approach measurement development and testing of multi-informant reports in 
school-based services and research. Multi-informant assessments may, in theory, allow school 
professionals to identify the specific contexts in which students display behaviors indicative of 
psychosocial functioning. Yet, in practice existing multi-informant instruments may be ill-suited 
to sensitively identify, on an item-specific basis, behaviors that an individual student might 
display specifically at home, at school, or consistently across contexts. We require this item-level 
certainty in measurement in order to ensure that informant discrepancies can meaningfully 


inform service delivery and research. 
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We propose that school-based service and research settings require measures of 
psychosocial functioning that contain items that vary as to their context specificity. That is, 
measures that contain two kinds of items. The first, discrepant items, include those items upon 
which informants are unlikely to agree, and are more likely to reflect context-specific 
behaviors. An example of such an item might be “avoids speaking in groups,” in which the 
behaviors described may systematically manifest to a greater degree in school, a setting in 
which school-based observers (e.g., teachers or school counselors) have more opportunities 
to observe students behave in social contexts relevant to these behaviors (e.g., group 
activities during class). The second, nondiscrepant items, include those items upon which 
informants are likely to agree, and may reflect cross-contextual behaviors. An example of 
such an item might be “helps others when asked.” Here, the behaviors described reflect a 
construct (e.g., prosocial behavior) that, when displayed, may systematically manifest to 
such a degree that multiple informants, regardless of social context, would have sufficient 
observations or “samples” of behavior about which to provide reports (e.g., helping fellow 
students at school, helping siblings at home or relatives at family gatherings). Equipped with 
these items, service providers and researchers could construct scales that assess behaviors 
that are bound to a given context (i.e., discrepant), or behaviors that are contextually 
unbound and thus displayed across contexts (i.e., nondiscrepant). 

Importantly, the approach we advance here and describe in further detail below seeks 
to unify the various traditions for integrating multi-informant data described previously. 
Specifically, by calling for measures that include scales with nondiscrepant items, we 
embrace the importance of tests of measurement variance, which have the key goal of 


identifying and retaining data about behaviors that manifest across informants’ perspectives 
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(see Dirks et al., 2014, Olino et al., 2018; Russell et al., 2016). At the same time, the idea to 
create scales that include items for which informants provide differential responses (i.e., 
discrepant items) embraces the need to retain information germane to individual informants’ 
unique perspectives (see also Becker-Haimes, Jensen-Doss, Birmaher, Kendall, & Ginsburg, 
2018; Kraemer et al., 2003; Laird & De Los Reyes, 2013; Laird & Weems, 2011). 

The combined, integrative approach we advance here would enable the development 
of specific interpretative guidelines for the purposes of diagnosis and evidence-based 
programming. In particular, data from discrepant scales could triangulate on characterizing and 
indexing changes in context-specific behaviors in response to school and/or home supports. In 
contrast, data from nondiscrepant scales might characterize and index changes in behaviors that 
cut across contexts or are not necessarily bound to any given social environment. Both of these 
scales could be used to pinpoint the selection and delivery of interventions, develop behavioral 
goals, and monitor intervention response in order to inform data-driven decisions. We see many 
possibilities for use of such scales in school-based services and research. 

What techniques might we leverage to identify context-sensitive items of students’ 
psychosocial functioning? The answer may lie in an innovative application of existing analytic 
techniques. Specifically, for decades researchers have leveraged indices of inter-rater agreement 
(IRA) and differential item functioning (DIF) to test whether scores obtained from different 
informants are interchangeable or equivalent in terms of their absolute value (IRA) or response 
probability (DIF; see Andrich & Hagquist, 2012; LeBreton, Burgess, Kaiser, & James, 2003). 
The typical application of these techniques involves identifying items that are subject to 
differential response. Predicated on the notion that differential response signals rater or statistical 


biases, measurement development has historically involved pruning out those items that respond 


INFORMANT DISCREPANCIES IN SCHOOL-BASED RESEARCH 26 


differentially across informants or raters. In stark contrast, we call for leveraging these 
techniques to: (a) explicitly identify items on which different informants make differential 
responses (i.e., discrepant), (b) distinguish those items from items on which informants provide 
similar responses (i.e., nondiscrepant), and (c) include both kinds of items on scales developed to 
measure psychosocial functioning. 

How might one apply these techniques to identify discrepant and nondiscrepant items? 
To illustrate, we describe below how one might implement three different quantitative metrics to 
identify discrepant and nondiscrepant items: (a) inter-rater agreement rwe index, (b) standardized 
mean difference (SMD), and (c) DIF. First, one of the most popular estimates of IRA is James, 
Demaree, and Wolf’s (1984) single-item rwc. We can depict the rwc index, which defines 


agreement in terms of the proportional reduction in error variance, by the equation: 


52 
Two =1- = (1) 
E 


where S’y is the observed variance on the variable X (e.g., ratings of noncompliance or positive 
affect) taken over K different informants or raters and o’z is the variance expected when there is 
a complete lack of agreement among the informants. One can calculate the degree of agreement 
by comparing the observed variance to the variance expected when informants respond 
randomly. Using this formula, when all informants are in perfect agreement, the observed 
variance among judges is 0, and rwc = 1.0. In contrast, when informants are in total lack of 
agreement, the observed variance will asymptotically approach the error variance, which leads 
rwe to incrementally approach 0.0. Consistent with guidelines for acceptable IRA based on 
correlational metrics (LeBreton et al., 2003), one can identify an item as nondiscrepant with rwc 
> .70. Conversely, the decision rule to identify discrepant items might be rwc < .70. 


Overall, rwcis a tool for identifying discrepant and nondiscrepant items in terms of 
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absolute agreement. Thus, this index allows us to identify candidate discrepant and 
nondiscrepant items. However, by construction this index yields no information on the direction 
of any discrepant item identified. For this approach to yield useful data, we require the 
identification of discrepant items that largely index behaviors constrained to one context and not 
others (e.g., school but not home), and vice versa. This necessitates a second index that captures 
the direction of discrepant items. Specifically, one can identify discrepant and nondiscrepant 


items using the SMD effect size, using the following formula: 


Xi — Xr2 
S 


where X;1 refers to the score for informant one (e.g., teacher) and X;2 refers to the value for 
informant two (e.g., parent) and Spoolea reflects the pooled standard deviation across informants’ 
scores on the particular item. For the SMD, one can use Cohen’s (1988) criteria as a guideline: 
small SMD = 0.20 to 0.49; moderate SMD = 0.50 to 0.79; large SMD > 0.80. Based on these 
guidelines, one can identify nondiscrepant items as those that have a SMD in the small range (- 
0.20 >SMD< +0.20), indicating low discrepancy. The decision rule to identify discrepant items 
would be a SMD in the large range (above +0.80 or below —0.80). Discrepant items would be 
further decomposed by items for which informant one (e.g., teacher) had the larger score (SMD 
>+.80), and items for which informant two (e.g., parent) had the larger score (SMD < —.80). 
Using the first two formulas, one can see item identification begins with rye to identify 
candidate discrepant and nondiscrepant items, and proceeds with SMD to accomplish two goals: 
(a) confirm identification of nondiscrepant items observed using rye (i.e., items with a rwc> .70 
and -0.20 >SMD< +0.20); and (b) identify context-specific discrepant items (i.e., items with 
scores for informant | > informant 2, and vice versa). However, a key limitation with relying on 


just these two indices is the lack of a confirmatory index for identifying directional discrepant 
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items. Thus, we propose a third stage to the item identification process through the use of DIF 
analyses within item response theory models, which provide information about the properties of 
items via individual responses to those items. DIF analyses determine whether informants’ 
ratings of the same latent trait (behavior) have a different probability of giving a certain response 
on behavior rating items (Andrich & Hagquist, 2012). As with the first two methods, the DIF 
method can be applied at the item-level, namely comparing item characteristics across rating 
sources for each of the items previously identified as discrepant and nondiscrepant. Based on 
these tests, one can implement a transformed log-odds ratio with the Mantel-Haenszel procedure 
to quantify the DIF effect (see Andrich & Hagquist 2012), and using odds ratio thresholds for 
nondiscrepant (.80 — 1.20) and discrepant (< 0.80 or >1.20) items. Items that comprise the final 
pool to include in discrepant and nondiscrepant scales would consist of those items that “pass” 
identification thresholds across all three IRA indices. 
Phase 3: Testing the Ability of Multi-Informant Assessments to Sensitively Detect 
Contextual Variations in Outcomes of School-Based Services 

After developing multi-informant measures of psychosocial functioning that contain 
discrepant and nondiscrepant items, one key question arises: Are scales containing these items 
capable of detecting context-specificity in behaviors displayed by an individual student? To 
address this question, Phase 3 leverages widely used techniques that have recently gained 
currency in psychological assessment (for a review, see Youngstrom, 2013). In particular, 
receiver operating characteristic (ROC) methods may facilitate the kinds of personalized 
assessment outcomes that the item-sensitive scales described previously ought to yield (De Los 
Reyes et al., 2015; NIMH, 2015). One leverages ROC methods for a variety of reasons. In 


particular, ROC methods allow a user to identify specific scores on an instrument for detecting 
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two different kinds of cases. First, one might use ROC methods to “rule in” or identify students 
who display a characteristic, such as students achieving positive response to an IEP, in an effort 
to optimize sensitivity of case identification. Second, ROC methods allow a user to identify 
specific scores on an instrument that allow one to “rule out” non-cases, such as a screening 
measure designed to identify students who do not require services, in an effort to optimize 
specificity of case identification. ROC methods involve testing links between measures of 
interest and their ability to either sensitively or specifically distinguish discrete events on a 
criterion variable, such as a student’s status on intervention response metrics or indicators of 
diagnostic status. Recent examples of applications of these ROC methods to psychological 
assessment exist elsewhere (e.g., Jarrett, Van Meter, Youngstrom, Hilton, & Ollendick, 2018). 
Tests using ROC methods may facilitate use of the discrepant and nondiscrepant scales 
with individual students. In fact, initial uses of ROC methods may test the utility of the scales to 
facilitate decision-making during circumstances of their optimal implementation. For example, 
official records (e.g., grade retention, below-average test scores), tasks (e.g., family interaction 
tasks or naturalistic classroom observations), and standardized tests can all serve as criterion 
measures in ROC tests. Further, the kind of criterion measure one selects ought to dictate which 
of the discrepant or nondiscrepant scales would achieve the highest sensitivity and/or specificity. 
For example, a discrepant scale designed to assess intervention responses in students’ behaviors 
bound to the school context should display relatively higher sensitivity for detecting changes in 
school-specific performance, relative to the sensitivity metrics yielded for discrepant scales 
bound to the home context. Conversely, discrepant scales bound to the home context should 
outperform discrepant scales bound to the school context for home-specific criterion measures. 


Further, nondiscrepant scales should outperform any discrepant scale for criterion measures that 
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index behaviors that display consistently across contexts. Should discrepant and nondiscrepant 
scales meet these expectations, specific scores could be identified that sensitively and/or 
specifically index students’ cross-contextual and context-specific behaviors. 

Phase 4: Translating Scientific Findings into Routine Practice 

Implementation science. Building from the previous phases, Phase 4 of the research 
agenda focuses on transferring scientific findings into routine practice in schools. In particular, 
the transdisciplinary field of implementation science focuses on bridging the gap between 
research and practice through the development and testing of frameworks and strategies to 
improve the uptake of evidence-based assessment and intervention practices into service delivery 
settings (Lobb & Colditz, 2013). Generally, implementation science examines techniques and 
methods (Powell et al., 2015) that optimize both implementation outcomes (e.g., fidelity, reach, 
feasibility, sustainment; Proctor et al., 2011) and client outcomes in specific service settings 
(e.g., hospital, mental health clinics, schools; Aarons, Hurlburt, & Horwitz, 2011; Damschroder 
et al., 2009; Proctor et al., 2011; Proctor et al., 2009). In this respect, two efforts might be 
particularly fruitful to pursue. 

First, in this paper we reviewed an emerging but compelling body of research 
demonstrating the value of informant discrepancies for improving the interpretability of multi- 
informant assessments of students’ psychosocial functioning. In doing so, we hope to increase 
efforts to disseminate this knowledge to scholars and practitioners in the discipline of school 
psychology and seed ideas for new research. At the same time, how might one carry out this 
work in school-based service settings? What measurement paradigms and statistical techniques 
might one leverage to test research questions germane to issues surrounding informant 


discrepancies in school-based assessments? We might facilitate answering questions such as 
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these by improving the synergy among scholars in school psychology, school professionals in the 
discipline, and scholars and professionals in disciplines where this kind of informant 
discrepancies work already occurs. Consequently, a key first step in implementation science 
endeavors should involve creating multi-disciplinary spaces, perhaps via pre-conference summits 
organized and held at professional conferences. These summits might focus on the sharing of 
ideas, resources, and strategies among scholars and professionals who regularly use multi- 
informant approaches to assess psychosocial functioning. These summits might result in cross- 
disciplinary collaborative teams. Members of these teams might establish research networks 
focused on addressing common questions surrounding informant discrepancies. We expect that 
establishing these collaborative networks will optimize the value of data gathered on these issues 
in school-based service settings, and at the same time reduce redundancies among research teams 
testing questions germane to informant discrepancies. 

Second, implementation research often involves conducting investigations of typical or 
usual practice. As we noted previously, collecting multi-informant data already comprises a key 
component of “best practices” in school-based assessments (e.g., Reynolds et al., 2006; Salvia 
et al., 2012). Yet, it remains an empirical question how professionals in school-based service 
settings use multi-informant data to make decisions regarding educational programming. That is, 
do school professionals as part of routine practice make these decisions in a way that is sensitive 
to the contexts in which students require services and/or improve as a consequence of receiving 
services? For example, when school professionals encounter discrepancies between a parent’s 
and teacher’s reports of a student’s psychosocial functioning, do they interpret these 
discrepancies as reflecting differences between how that student behaves at home versus school? 


Alternatively, do professionals often make decisions running on the assumption that when they 
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observe informant discrepancies, this signals that one or more of the informants provided reports 
of questionable precision or accuracy? Research focused on attaining estimates of how school 
professionals currently use multi-informant data to make decisions about educational 
programming comprises a necessary step toward understanding potential barriers in developing 
multi-informant assessments along the lines that we propose. 

Implementation practice. Jmplementation practice is closely tied to implementation 
science and involves applying the knowledge generated from research to effectively install 
quality practices as part of routine service delivery. We need innovative research that examines 
how to address the implementation gap, in order to maximize both the efficient use of evidence- 
based assessment and intervention practices, and the impact of these practices on the students 
who could benefit from research on informant discrepancies (O’Connell, Boat, & Warner, 2009). 
Indeed, although significant attention has been devoted to cataloging barriers/facilitators 
(Damschroder et al., 2009) and identifying implementation strategies (Waltz et al., 2015), there 
is a need for research that develops and tests innovative strategies to enhance successful 
administration and use of evidence-based assessment practices. In schools, implementation 
initiatives often occur as top-down mandates without attention paid to the individual factors 
found to impact practitioner behavior change (e.g., perceptions and intentions influencing 
implementation fidelity). In the case of multi-informant assessments, failing to address 
individual-level barriers (e.g., knowledge, motivation, self-efficacy) may be costly. That is, 
individual behavior change is ultimately required for practitioners—school psychologists, special 
educators, and educational diagnosticians—to incorporate scientific findings on informant 
discrepancies into practice, even when organizational factors such as evidence-informed policy, 


supportive leadership, and effective training are in place (Michie, van Stralen, & West, 2011). 
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Thus, we need to investigate ways of effectively supporting practitioner behavior change to 
ensure successful transfer of knowledge on informant discrepancies. 

Regarding practitioner behavior change, we might consider recent efforts focused on 
addressing long-standing barriers in assessment practices, albeit outside the realm of informant 
discrepancies research. Specifically, scholars and professionals focused on assessing and 
diagnosing pediatric bipolar disorder have long raised concerns about the lack of standardized 
approaches to training professionals on how to assess and diagnose the condition (for a review, 
see Youngstrom et al., 2017). In fact, the wide variation in training might very well account for 
the relatively low rates of inter-rater reliability often observed among assessors’ diagnostic 
decisions (Youngstrom, Halverson, Youngstrom, Lindhiem, & Findling, 2018). Some of the 
issues stemming from current practices resolve themselves, in part, using short workshop 
training programs focused on behavior change. One program involves providing professionals 
with tools for making informed diagnostic decisions based on standardized instruments (Jenkins, 
Youngstrom, Washburn, & Youngstrom, 2011). Another program from the same team focuses 
on reducing decision-making biases and errors commonly found when professionals diagnose the 
condition (Jenkins & Youngstrom, 2016). Researchers designed both of these programs to be (a) 
brief; (b) scalable for use either online (Jenkins & Youngstrom, 2016), or within the workshop 
structures of professional meetings (Jenkins et al., 2011); and (c) interactive such that attendees 
both learn procedures and practice their implementation within the same program sequence. 

We review this example to highlight approaches that research teams might modify for use 
in dealing with potential barriers to using and interpreting multi-informant assessments. For 
instance, these training programs might focus on providing professionals with tools for how to 


test the meaning of informant discrepancies when they encounter them. The tools might involve 
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strategic use of independent assessments of behaviors (e.g., home and school observations) to 
decipher whether a particular instance of discrepant reports (e.g., teachers report student 
behavior that is not corroborated by parent reports) occurred because the behaviors manifest in 
one context but not the other. Essentially, a key next step in implementation practice might 
involve creating feasible (1.e., brief, inexpensive), scalable programs designed to facilitate 
application of the Operations Triad Model (De Los Reyes, Thomas, et al., 2013) in school-based 
service settings. In sum, we expect a combination of research endeavors designed to impact both 
implementation science and implementation practice to facilitate adoption and use of the 
innovative multi-informant assessments we propose. 
Concluding Comments 

Across multiple disciplines tasked to deliver and study services for assessing and 
intervening upon children’s psychosocial functioning, a key challenge is that the foci of these 
services are moving targets. The constituent behaviors that reflect psychosocial functioning vary 
widely in their displays, from context-to-context, and in the school setting, from among different 
students. Professionals in school-based service and research settings take great strides to account 
for all of this complexity by, among other practices, collecting data about students’ psychosocial 
functioning from multiple key figures in students’ lives. We ask students how they feel, how 
they behave, what they need. We ask their teachers and parents for their thoughts on these 
matters. These data tell us a lot about students. Yet, when each piece of data points us in a 
different direction, we realize that we need to learn more in order to make the right decisions for 
students and the services required to meet their needs. Stated otherwise, we need a better 
“roadmap.” 


We reviewed a body of work—occurring largely outside of school-based service delivery 
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and research—focused on what we can learn from the discrepancies between informants tasked 
to report on students’ psychosocial functioning. We used this overview to inform the 
development of a four-phase research agenda designed for direct application to understanding 
these informant discrepancies as they manifest in school-based service and research settings. We 
have a lot of work ahead of us. We need new studies. We need new measures. We perhaps need 
to rethink the findings of prior studies and refine existing measures. We are also very curious 
about what lies ahead regarding research on multi-informant assessments. After reading our 


paper, we hope you are too. 
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Figure 1. Graphical 
representation of the 
research concepts 
that comprise the 
Operations Triad 
Model. Originally 
published in De Los 
Reyes, Thomas, et 
al. (2013). © Annual 
Review of Clinical 
Psychology. 
Copyright 2012 
Annual Reviews. All 
rights reserved. The 
Annual Reviews 
logo, and other 
Annual Reviews 
products referenced 
herein are either 
registered 
trademarks or 
trademarks of 
Annual Reviews. All 
other marks are the 
property of their 
respective owner 
and/or licensor. 
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A priori hypothesis regarding informants’ reports 


+ Informant correspondence = Conduct tests for Converging Operations 
+ Informant discrepancies = Conduct tests for Diverging Operations 


Converging Operations 


+ Do the empirical observations drawn from multiple informants’ 
reports reach an a priori threshold of consistency in research 
conclusions? 


- A“yes” = evidence supportive of Converging Operations. 
- A“no” = post-hoc tests for Compensating Operations. 


Diverging Operations 

+ Do the informants’ reports reach a priori thresholds of measure 
reliability in the sample in which one observes informant 
discrepancies? 

+ Do the informants’ reports relate to measures of other 
constructs identified a priori as supportive evidence of measure 
validity? 

+ Can one rule out methodological factors as explanations of 
informant discrepancies? 


+ If so, do the discrepancies exhibit systematic variation 
germane to the behaviors for which informants provide 
reports? 

- “Yes” responses to all of the above questions = evidence 
supportive of Diverging Operations. 


+ Conclusion: One must incorporate meaningful 
interpretations of the informant discrepancies into the 
conceptualization of the assessed behaviors. 


+ A“No” response to any of the above questions = post-hoc tests 
for Compensating Operations. 


Compensating Operations 


+ Does the evidence reveal that one or more informants provide unreliable reports? 
+ Does the evidence reveal that one or more informants’ reports are not valid? 


+ Does the evidence reveal that methodological factors account for informant discrepancies? 

+ Affirmative responses to the above questions = evidence supportive of Compensating Operations. 

+ Conclusion: One can interpret informant discrepancies as resulting from psychometric issues and thus can be justified in using 
statistical or methodological techniques to compensate for the lack of convergence. 


Figure 2. Graphical display of decision-making 
processes based on the Operations Triad Model. 
Originally published in De Los Reyes, Thomas, et 
al. (2013). © Annual Review of Clinical 
Psychology. Copyright 2012 Annual Reviews. All 
rights reserved. The Annual Reviews logo, and 
other Annual Reviews products referenced herein 
are either registered trademarks or trademarks of 
Annual Reviews. All other marks are the property 
of their respective owner and/or licensor. 
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Phase 1: Generalizability of the Operations Triad Model to School-Based 
Assessments 


Testing whether the Operations Triad Model generalizes to school-based assessments of 
students’ psychosocial functioning. 


Phase 2: Strategies for Optimizing Multi-Informant Assessments to Yield 
Interpretable Data about Informant Discrepancies in School-Based Services and 
Research 


Efforts to develop and/or refine multi-informant measures of student psychosocial 
functioning. 


Phase 3: Testing the Ability of Multi-Informant Assessments to Sensitively Detect 
Contextual Variations in Outcomes of School-Based Services 


Efforts to determine whether scales developed in previous phases are capable of 
detecting context-specific behaviors exhibited by an individual student. 


Phase 4: Translating Scientific Findings into Routine Practice 


Efforts to transfer scientific findings from previous phases into routine practice in 
schools, with the goal of maximizing the likelihood that students benefit from 
innovations in multi-informant assessments. 


Figure 3. Graphical display of the four phases of proposed research agenda for informant discrepancies in school-based services and research. 


